---
title: "Introduction to GLMs"
format:
html: default
revealjs:
output-file: Intro-to-GLMs-slides.html
pdf:
output-file: Intro-to-GLMs-handout.pdf
---
# Introduction
---
{{< include shared-config.qmd >}}
## Welcome
Welcome to Epidemiology 204: Quantitative Epidemiology III (Statistical Models).
Epi 204 is a course on **regression modeling**.
## What you should already know {.scrollable}
---
{{< include prereq-knowledge.qmd >}}
## What we will cover in this course
---
* Linear (Gaussian) regression models (review and more details)
* Regression models for non-Gaussian outcomes
+ binary
+ count
+ time to event
* Statistical analysis using R
We will start where Epi 203 left off: with linear regression models.
## Motivations for regression models
---
:::{#exr-why-regression}
Why do we need regression models?
:::
---
::: {#sol-why-regression .incremental}
* when there's not enough data to analyze every subgroup of interest
individually
* especially when subgroups are defined using continuous predictors
:::
### Example: Adelie penguins
```{r}
#| label: fig-palmer-1
#| fig-cap: Palmer penguins
#| echo: false
library(ggplot2)
library(plotly)
library(dplyr)
ggpenguins <-
palmerpenguins::penguins |>
dplyr::filter(species == "Adelie") |>
ggplot(
) +
aes(x = bill_length_mm, y = body_mass_g) +
geom_point() +
xlab("Bill length (mm)") +
ylab("Body mass (g)")
print(ggpenguins)
```
### Linear regression
```{r}
#| label: fig-palmer-2
#| fig-cap: Palmer penguins with linear regression fit
ggpenguins2 <-
ggpenguins +
stat_smooth(
method = "lm",
formula = y ~ x,
geom = "smooth"
)
ggpenguins2 |> print()
```
### Curved regression lines
```{r}
#| label: fig-palmer-3
#| fig-cap: Palmer penguins - curved regression lines
ggpenguins2 <- ggpenguins +
stat_smooth(
method = "lm",
formula = y ~ log(x),
geom = "smooth"
) +
xlab("Bill length (mm)") +
ylab("Body mass (g)")
ggpenguins2
```
### Multiple regression
```{r}
#| label: fig-palmer-4
#| fig-cap: "Palmer penguins - multiple groups"
ggpenguins <-
palmerpenguins::penguins |>
ggplot(
aes(
x = bill_length_mm,
y = body_mass_g,
color = species
)
) +
geom_point() +
stat_smooth(
method = "lm",
formula = y ~ x,
geom = "smooth"
) +
xlab("Bill length (mm)") +
ylab("Body mass (g)")
ggpenguins |> print()
```
### Modeling non-Gaussian outcomes
::: {#fig-beetles_1a}
```{r}
library(glmx)
data(BeetleMortality)
beetles <- BeetleMortality |>
mutate(
pct = died / n,
survived = n - died
)
plot1 <-
beetles |>
ggplot(aes(x = dose, y = pct)) +
geom_point(aes(size = n)) +
xlab("Dose (log mg/L)") +
ylab("Mortality rate (%)") +
scale_y_continuous(labels = scales::percent) +
# xlab(bquote(log[10]), bquote(CS[2])) +
scale_size(range = c(1, 2))
print(plot1)
```
Mortality rates of adult flour beetles after five hours' exposure to gaseous carbon disulphide (Bliss 1935)
:::
### Why don't we use linear regression?
:::{#fig-beetles-plot1}
```{r}
beetles_long <-
beetles |>
reframe(
.by = everything(),
outcome = c(
rep(1, times = died),
rep(0, times = survived)
)
)
lm1 <-
beetles_long |>
lm(
formula = outcome ~ dose,
data = _
)
range1 <- range(beetles$dose) + c(-.2, .2)
f_linear <- function(x) predict(lm1, newdata = data.frame(dose = x))
plot2 <-
plot1 +
geom_function(fun = f_linear, aes(col = "Straight line")) +
labs(colour = "Model", size = "")
print(plot2)
```
Mortality rates of adult flour beetles after five hours' exposure to
gaseous carbon disulphide (Bliss 1935)
:::
### Zoom out
:::{#fig-beetles2}
```{r}
#| echo: false
print(plot2 + expand_limits(x = c(1.6, 2)))
```
Mortality rates of adult flour beetles after five hours' exposure to
gaseous carbon disulphide (Bliss 1935)
::::
### log transformation of dose?
:::{#fig-beetles3}
```{r}
lm2 <-
beetles_long |>
lm(formula = outcome ~ log(dose), data = _)
f_linearlog <- function(x) predict(lm2, newdata = data.frame(dose = x))
plot3 <- plot2 +
expand_limits(x = c(1.6, 2)) +
geom_function(fun = f_linearlog, aes(col = "Log-transform dose"))
print(plot3 + expand_limits(x = c(1.6, 2)))
```
Mortality rates of adult flour beetles after five hours' exposure to gaseous carbon disulphide (Bliss 1935)
:::
### Logistic regression
:::{#fig-beetles4b}
```{r}
glm1 <- beetles |>
glm(formula = cbind(died, survived) ~ dose, family = "binomial")
f <- function(x) {
glm1 |>
predict(newdata = data.frame(dose = x), type = "response")
}
plot4 <- plot3 + geom_function(fun = f, aes(col = "Logistic regression"))
print(plot4)
```
Mortality rates of adult flour beetles
after five hours' exposure to gaseous carbon disulphide (Bliss 1935)
:::
## Structure of regression models
---
{{< include _sec_regression_defs.qmd >}}
---
:::{#exr-glm-structure}
What is the general structure of a generalized linear model?
:::
---
:::{#sol-glm-structure}
{{< include _sec_glm_structure.qmd >}}
:::
---